Search CORE

38 research outputs found

Crawling in Rogue's dungeons with (partitioned) A3C

Author: A Asperti
A Asperti
MG Bellemare
R Sun
RS Sutton
V Cerny
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/09/2018
Field of study

Rogue is a famous dungeon-crawling video-game of the 80ies, the ancestor of its gender. Rogue-like games are known for the necessity to explore partially observable and always different randomly-generated labyrinths, preventing any form of level replay. As such, they serve as a very natural and challenging task for reinforcement learning, requiring the acquisition of complex, non-reactive behaviors involving memory and planning. In this article we show how, exploiting a version of A3C partitioned on different situations, the agent is able to reach the stairs and descend to the next level in 98% of cases.Comment: Accepted at the Fourth International Conference on Machine Learning, Optimization, and Data Science (LOD 2018

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Recommended from our members

Performance Enhancement of Deep Reinforcement Learning Networks using Feature Extraction

Author: CJ Watkins
D Silver
D Silver
G Tesauro
GE Hinton
MG Bellemare
R Bellman
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

The combination of Deep Learning and Reinforcement Learning, termed Deep Reinforcement Learning Networks (DRLN), offers the possibility of using a Deep Learning Neural Network to produce an approximate Reinforcement Learning value table that allows extraction of features from neurons in the hidden layers of the network. This paper presents a two stage technique for training a DRLN on features extracted from a DRLN trained on a identical problem, via the implementation of the Q-Learning algorithm, using TensorFlow. The results show that the extraction of features from the hidden layers of the Deep Q-Network improves the learning process of the agent (4.58 times faster and better) and proves the existence of encoded information about the environment which can be used to select the best action. The research contributes preliminary work in an ongoing research project in modeling features extracted from DRLNs

City Research Online

Crossref

Identifying Critical States by the Action-Based Variance of Expected Return

Author: CJ Watkins
G Liu
IH Witten
M Stolle
MG Bellemare
SJ Kazemitabar
V Mnih
Y Kuniyoshi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/11/2020
Field of study

The balance of exploration and exploitation plays a crucial role in accelerating reinforcement learning (RL). To deploy an RL agent in human society, its explainability is also essential. However, basic RL approaches have difficulties in deciding when to choose exploitation as well as in extracting useful points for a brief explanation of its operation. One reason for the difficulties is that these approaches treat all states the same way. Here, we show that identifying critical states and treating them specially is commonly beneficial to both problems. These critical states are the states at which the action selection changes the potential of success and failure substantially. We propose to identify the critical states using the variance in the Q-function for the actions and to perform exploitation with high probability on the identified states. These simple methods accelerate RL in a grid world with cliffs and two baseline tasks of deep RL. Our results also demonstrate that the identified critical states are intuitively interpretable regarding the crucial nature of the action selection. Furthermore, our analysis of the relationship between the timing of the identification of especially critical states and the rapid progress of learning suggests there are a few especially critical states that have important information for accelerating RL rapidly.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive

Crossref

C-tests revisited: back and forth with complexity

Author: B Hibbard
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
J Hernández-Orallo
MG Bellemare
RJ Solomonoff
S Legg
T Schaul
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/07/2015
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-21365-1_28We explore the aggregation of tasks by weighting them using a difficulty function that depends on the complexity of the (acceptable) policy for the task (instead of a universal distribution over tasks or an adaptive test). The resulting aggregations and decompositions are (now retrospectively) seen as the natural (and trivial) interactive generalisation of the C-tests.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN 2010-21062-C02-02, PCIN-2013-037 and TIN 2013-45732-C4-1-P, and by Generalitat Valenciana PROMETEOII 2015/013.Hernández Orallo, J. (2015). C-tests revisited: back and forth with complexity. En Artificial General Intelligence 8th International Conference, AGI 2015, AGI 2015, Berlin, Germany, July 22-25, 2015, Proceedings. Springer International Publishing. 272-282. https://doi.org/10.1007/978-3-319-21365-1_28S272282Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253–279 (2013)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: Computational measures of information gain and reinforcement in inference processes. AI Communications 13(1), 49–50 (2000)Hernández-Orallo, J.: On the computational measurement of intelligence factors. In: Meystel, A. (ed.) Performance metrics for intelligent systems workshop, pp. 1–8. National Institute of Standards and Technology, Gaithersburg (2000)Hernández-Orallo, J.: AI evaluation: past, present and future (2014). arXiv preprint arXiv:1408.6908Hernández-Orallo, J.: On environment difficulty and discriminating power. Autonomous Agents and Multi-Agent Systems, 1–53 (2014). http://dx.doi.org/10.1007/s10458-014-9257-1Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J., Dowe, D.L., Hernández-Lloreda, M.V.: Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research 27, 50–74 (2014)Hernández-Orallo, J., Minaya-Collado, N.: A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In: Proc. Intl. Symposium of Engineering of Intelligent Systems (EIS 1998), pp. 146–163. ICSC Press (1998)Hibbard, B.: Bias and no free lunch in formal measures of intelligence. Journal of Artificial General Intelligence 1(1), 54–61 (2009)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3 edn. Springer-Verlag (2008)Schaul, T.: An extensible description language for video games. IEEE Transactions on Computational Intelligence and AI in Games PP(99), 1–1 (2014)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and control 7(1), 1–22 (1964

Crossref

RiuNet

Deep Reinforcement Learning: An Overview

Author: AG Barto
D Ormoneit
F Sehnke
G Tesauro
H-G Beyer
J Kober
J Schmidhuber
LP Kaelbling
MG Bellemare
P Vincent
RS Sutton
S Hochreiter
SS Mousavi
V Mnih
W Böhmer
Y Bengio
Y Bengio
Y Bengio
Y Lecun
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/06/2018
Field of study

In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201

arXiv.org e-Print Archive

Crossref

A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

Author: A Aydemir
A Elfes
A Giusti
A Tampuu
B Kuipers
C Cadena
C Tomasi
CL Giles
FS Melo
J Canny
JK Gupta
K Konolige
L Busoniu
L Panait
LE Kavraki
M Jaderberg
MG Bellemare
RC Smith
RS Sutton
S Daftry
S Hochreiter
V Mnih
Publication venue
Publication date: 09/07/2020
Field of study

Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at https://unnat.github.io/cordial-sync .Comment: Accepted to ECCV 2020 (spotlight); Project page: https://unnat.github.io/cordial-syn

arXiv.org e-Print Archive

Crossref

Online Continual Learning on Sequences

Author: BM Lake
FM Richardson
GI Parisi
GL Ming
HS Kudrimoti
JB Aimone
JL Elman
K Holyoak
KA Krueger
L Mici
L Oneto
M Baccouche
M Everingham
M Jung
M Karlsson
MG Bellemare
RM French
S Fusi
S Grossberg
S Ji
S Marsland
W Deng
Z Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/03/2020
Field of study

Online continual learning (OCL) refers to the ability of a system to learn over time from a continuous stream of data without having to revisit previously encountered training samples. Learning continually in a single data pass is crucial for agents and robots operating in changing environments and required to acquire, fine-tune, and transfer increasingly complex representations from non-i.i.d. input distributions. Machine learning models that address OCL must alleviate \textit{catastrophic forgetting} in which hidden representations are disrupted or completely overwritten when learning from streams of novel input. In this chapter, we summarize and discuss recent deep learning models that address OCL on sequential input through the use (and combination) of synaptic regularization, structural plasticity, and experience replay. Different implementations of replay have been proposed that alleviate catastrophic forgetting in connectionists architectures via the re-occurrence of (latent representations of) input sequences and that functionally resemble mechanisms of hippocampal replay in the mammalian brain. Empirical evidence shows that architectures endowed with experience replay typically outperform architectures without in (online) incremental learning tasks.Comment: L. Oneto et al. (eds.), Recent Trends in Learning From Data, Studies in Computational Intelligence 89

arXiv.org e-Print Archive

Crossref

Increasing generality in machine learning through procedural content generation

Author: A Cully
A Radford
A Soltoggio
A Summerville
AM Smith
C Browne
D Perez-Liebana
EJ Hastings
GI Parisi
J Togelius
J Togelius
K Perlin
M Cook
M Mateas
M Max Jaderberg
MG Bellemare
N Hansen
OM Andrychowicz
R Storn
S Risi
S Thrun
T Elsken
TS Ra
X Cui
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Procedural Content Generation (PCG) refers to the practice, in videogames and other games, of generating content such as levels, quests, or characters algorithmically. Motivated by the need to make games replayable, as well as to reduce authoring burden, limit storage space requirements, and enable particular aesthetics, a large number of PCG methods have been devised by game developers. Additionally, researchers have explored adapting methods from machine learning, optimization, and constraint solving to PCG problems. Games have been widely used in AI research since the inception of the field, and in recent years have been used to develop and benchmark new machine learning algorithms. Through this practice, it has become more apparent that these algorithms are susceptible to overfitting. Often, an algorithm will not learn a general policy, but instead a policy that will only work for a particular version of a particular task with particular initial parameters. In response, researchers have begun exploring randomization of problem parameters to counteract such overfitting and to allow trained policies to more easily transfer from one environment to another, such as from a simulated robot to a robot in the real world. Here we review the large amount of existing work on PCG, which we believe has an important role to play in increasing the generality of machine learning methods. The main goal here is to present RL/AI with new tools from the PCG toolbox, and its secondary goal is to explain to game developers and researchers a way in which their work is relevant to AI research

arXiv.org e-Print Archive

Crossref

The IT University of Copenhagen's Repository

AI to enhance interactive simulation-based training in resuscitation medicine

Author: AG G EM R, Champion H
Bellomo R Goldsmith D, Uchino S
Brisk R
Confidential N
CW C Soar J, Aibiki M
FF A Santana N
Hogan H Healey F, Neale G, Thomson R, Vincent C, Black N
JP N Soar J, Smith G
Kaneva B Torralba A, Freeman W
Kolb D
Li W Fritz M
MG H Little L
Mnih V Badia A, Mirza P, Graves A, Lillicrap T, Harley P, Silver D, Kavukcuoglu K
Mnih V Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M
Mnih V Kavukcuoglu K, Silver D, Rusu A, Veness J, Bellemare M, Graves A, Riedmiller M, Fidjeland A, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D
National Institute for Health and Clinical Excellence
NE S AG G, SA R
Perkins G Kimani P, Bullock I
Perkins G Kimani P, Bullock I
Pishchulin L Jain A, Wojek C, Andriluka M, Thormaehlen T, Schiele B
RM S Niles D, Meaney P
Schneider M Rittle-Johnson B, Star J
Silver D
Sutton R Barto A
Thomson R Leuttel D, Healey F, Scobie S
Wang S Summers R
Young G
Publication venue: 'BCS Learning and Development Limited'
Publication date: 10/05/2018
Field of study

Crossref

Ulster University's Research Portal

Trappin-2/Elafin Modulate Innate Immune Responses of Human Endometrial Epithelial Cells to PolyI∶C

Author: A Bellemare
A Nazli
A Nazli
A Rapista
A Rebbapragada
A Roghanian
AA Ashkar
AE King
AE King
AG Drannik
AJ Bett
AJ Simpson
AJ Simpson
AJ Simpson
Anna G. Drannik
Bethany M. Henrick
CC Taggart
CD Bingle
CE Samuel
Clive M. Gray
CM Bauer
CR Wira
CT Pham
DS Hobbs
E Hazrati
EE Freeman
F Semple
F Terenzi
G Ma
H Kato
H Kato
H Sakurai
J Drenth
J Guo
J Schalkwijk
J Schalkwijk
J Schalkwijk
JE Piletz
Jean-Michel Sallenave
JM Sallenave
JM Sallenave
JM Sallenave
JM Sallenave
JM Sallenave
JP Motta
JV Fahey
JW McMichael
K Baranger
K Nara
Kakon Nag
Kenneth L. Rosenthal
KL Mossman
L Alexopoulou
L Shaw
L Steinstraesser
M Ghosh
M Hasan
M Lieber
M Tsunemi
M Vandermeeren
M Yoneyama
M Yoneyama
M Yoneyama
ME Quinones-Mateu
MG Wathelet
ML Zani
MS Mulligan
MW Butler
N Guyot
N Guyot
N Guyot
NG McElvaney
O Wiedow
P Paladino
PA Henriksen
PJ McLaren
R Alvarez
R Lin
R Pfundt
RN Harty
RS Noyce
RT Lester
RT Lester
SJ McAlhany
SJ-M Vergnolle N
SM Iqbal
SR Alam
T Maniatis
T Moreau
TM Schaefer
TS Wilkinson
V MasCasullo
W Erhart
W Zapata
Xiao-Dan Yao
Y Li
Z Dogusan
Publication venue: Public Library of Science
Publication date: 24/04/2012
Field of study

BACKGROUND: Upon viral recognition, innate and adaptive antiviral immune responses are initiated by genital epithelial cells (ECs) to eradicate or contain viral infection. Such responses, however, are often accompanied by inflammation that contributes to acquisition and progression of sexually transmitted infections (STIs). Hence, interventions/factors enhancing antiviral protection while reducing inflammation may prove beneficial in controlling the spread of STIs. Serine antiprotease trappin-2 (Tr) and its cleaved form, elafin (E), are alarm antimicrobials secreted by multiple cells, including genital epithelia. METHODOLOGY AND PRINCIPAL FINDINGS: We investigated whether and how each Tr and E (Tr/E) contribute to antiviral defenses against a synthetic mimic of viral dsRNA, polyinosine-polycytidylic acid (polyI:C) and vesicular stomatitis virus. We show that delivery of a replication-deficient adenovector expressing Tr gene (Ad/Tr) to human endometrial epithelial cells, HEC-1A, resulted in secretion of functional Tr, whereas both Tr/E were detected in response to polyI:C. Moreover, Tr/E were found to significantly reduce viral replication by either acting directly on virus or through enhancing polyI:C-driven antiviral protection. The latter was associated with reduced levels of pro-inflammatory factors IL-8, IL-6, TNFα, lowered expression of RIG-I, MDA5 and attenuated NF-κB activation. Interestingly, enhanced polyI:C-driven antiviral protection of HEC-Ad/Tr cells was partially mediated through IRF3 activation, but not associated with higher induction of IFNβ, suggesting multiple antiviral mechanisms of Tr/E and the involvement of alternative factors or pathways. CONCLUSIONS AND SIGNIFICANCE: This is the first evidence of both Tr/E altering viral binding/entry, innate recognition and mounting of antiviral and inflammatory responses in genital ECs that could have significant implications for homeostasis of the female genital tract

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central